05. Comparison with a Deep Neural Network
Comparison with a Deep Neural Network
A traditional Computer Vision pipeline for object detection typically consists of separate processing stages, for instance, feature extraction, spatial sampling and classification. Different algorithms and techniques can be applied for these stages, and the parameters for each stage are usually tuned by hand.
In comparison, a Deep Neural Network designed for object detection can perform these tasks using a complex interconnected architecture where the stages are not as distinct. Lower layers of a neural network, i.e. the ones closer to the input data, typically perform some equivalent of feature extraction, while higher layers can localize and classify simultaneously to produce detections.
Both approaches have their strengths and weaknesses, in terms of performance, development effort, interpretability, etc. For a more in-depth discussion of how Deep Learning is being applied to object detection tasks, you may refer to the following papers (and references therein):
Redmon et al., 2015. You Only Look Once: Unified, Real-Time Object Detection. [arXiv]
W. Liu et al., 2015. SSD: Single Shot MultiBox Detector. [arXiv]
S. Ren et al., 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. [arXiv]